Sparse spectral graph analysis and its application to gastric cancer drug resistance-specific molecular interplays identification

Uncovering acquired drug resistance mechanisms has garnered considerable attention as drug resistance leads to treatment failure and death in patients with cancer. Although several bioinformatics studies developed various computational methodologies to uncover the drug resistance mechanisms in cancer chemotherapy, most studies were based on individual or differential gene expression analysis. However the single gene-based analysis is not enough, because perturbations in complex molecular networks are involved in anti-cancer drug resistance mechanisms. The main goal of this study is to reveal crucial molecular interplay that plays key roles in mechanism underlying acquired gastric cancer drug resistance. To uncover the mechanism and molecular characteristics of drug resistance, we propose a novel computational strategy that identified the differentially regulated gene networks. Our method measures dissimilarity of networks based on the eigenvalues of the Laplacian matrix. Especially, our strategy determined the networks’ eigenstructure based on sparse eigen loadings, thus, the only crucial features to describe the graph structure are involved in the eigenanalysis without noise disturbance. We incorporated the network biology knowledge into eigenanalysis based on the network-constrained regularization. Therefore, we can achieve a biologically reliable interpretation of the differentially regulated gene network identification. Monte Carlo simulations show the outstanding performances of the proposed methodology for differentially regulated gene network identification. We applied our strategy to gastric cancer drug-resistant-specific molecular interplays and related markers. The identified drug resistance markers are verified through the literature. Our results suggest that the suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays enriched in the Extracellular-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer. The developed strategy will be a useful tool to identify phenotype-specific molecular characteristics that can provide essential clues to uncover the complex cancer mechanism.


Introduction
Gastric cancer is a leading cause of cancer-related deaths worldwide.Chemotherapy is the standard gastric cancer treatment.However, chemotherapy is not always effective because cancer cells acquire resistance to targeted drugs.Several studies have been conducted to uncover drug resistance mechanisms in cancer chemotherapy in various research fields, including medicine, health sciences, and pharmaceutical sciences.Although many bioinformatics studies attempted to uncover the drug resistance mechanism and related molecular characteristics, most studies were based on individual or differential gene expression (DEG) analysis [1,2].However, the complex mechanisms underlying drug resistance in cancer cells cannot be understood by perturbation in a single gene because abnormalities in complex molecular networks cause the mechanisms involved in disease.Therefore, the single gene-based and DEG analysis is insufficient to uncover drug resistance mechanisms and molecular characteristics in cancer cells.
The main aim of this study is to disclose the crucial molecular interplay that is instrumental in the mechanism driving acquired drug resistance in gastric cancer.In order to effectively identify the crucial molecular interplay, we consider drug resistance-specific molecular characteristics identification based on differential gene network (ssDGN) analysis.Recently, several methods have been proposed to differential gene network analysis by comparing graph spectra [1,3,4].The studies identified differential networks based on eigenvalues of adjacency, Laplacian, and normalized Laplacian matrices.However, the eigenanalysis in these studies is based on fully dense loadings and thus suffers from the following drawbacks: "difficulty in interpretation of the eigenanalysis results, because eigenvalues are estimated as linear combinations of all variables" and "the results may be disturbed by noise features included in datasets".In order to effectively perform for differentially regulated gene network identification, we develop a novel computational strategy for sparse eigenanalysis of graph spectra called Net-sEVD.Our strategy incorporates sparsity into eigen loading estimation based on L 1 -type regulari zation.Thus, only crucial features to describe the graph structure are involved in the eigenanalysis.Furthermore, the sparse loadings lead to effective interpretation of the eigenanalysis results.To achieve a biologically reliable results, we also incorporated knowledge of network biology that "the genes linked in the networks may have similar biological functions" and "the hub genes linked with many genes are crucial markers to understand biological mechanisms" into the eigen loading estimation [5].Consequently, our strategy encourages coefficient similarity of the linked genes in the eigenloadings estimation, which leads to the simultaneous selection of the linked genes.Our analysis imposes a relatively small penalty on the hub genes.Thus, hub genes and their neighboring genes have relatively large coefficients and are easily selected.In summary, the spectral graph analysis is based on crucial sub-networks consisting of the hub genes and their neighboring genes.Therefore, we can perform a biologically reliable interpretation of the spectral graph analysis results.
Monte Carlo simulations were conducted to demonstrate the proposed method's effectiveness.The ssDGN method showed an outstanding performance in identifying differentially regulated gene networks in various scenarios of gene network structures.We applied ssDGN to identify differentially regulated gene networks between gastric cancer-drug-sensitive and -resistant cell lines based on the Sanger Genomics of Drug Sensitivity in Cancer (GDSC) dataset.We then identified gastric cancer drug resistance-specific molecular interplays and related markers.Most of the identified drug resistance markers have strong evidence as biomarkers for gastric cancer and its chemotherapy.Our results with Gene Ontology (GO) term pathway analysis uncovered that "extracellular matrix" (ECM)-related pathways are involved in drug resistance mechanisms, and the COL4A1, PXDN, and TGFBI genes act as drug resistance markers.We suggest that suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays enriched in the extracellular-related pathways may provide crucial clues to enhance gastric cancer's chemosensitivity.
The remainder of this paper is organized as follows: The methods section introduces the proposed method for sparse spectral graph analysis and ssDGN identification.We discuss simulation studies in the Monte Carlo simulations section.Finally, we describe the results of identifying gastric cancer drug resistance-specific gene networks.Conclusions are provided in the Discussion section.

Methods
We considered a gene regulatory network that is represented by a weighted undirected graph G = (V, E, W), where V = {1, . .., p} is the set of vertices corresponding to p genes, E 2 V × V (i.e., E = (i * j)) is the edge set, and W = (w ij ), (i, j) 2 E is the edge weight.The adjacency matrix A(G) is defined as The degree of a vertex i is defined as d i = ∑ j*i w ij , and the degree matrix D is given as the following diagonal matrix The Laplacian matrix is given as L 0 = D − A(G) and the normalized Laplacian matrix L = D −1/2 L 0 D −1/2 is defined as [3,5], The gene regulatory network can be described by A(G), L 0 and L.

Existing method to measure similarity of graph
Preliminaries.In this section, we discuss the similar matrix and their eigenvalues.The similar matrix is defined as follows (Definition 4.1 of [6]), Definition 1 The p × p matrices M Y and M N are said to be similar matrices if a nonsingular matrix V exists, such that We then consider the following property for the p × p matrix (Theorem 3.2 (d) of [6]).
Theorem 1 Let M Y be an p × p matrix.Then, the eigenvalues of VM Y V −1 are the same as the eigenvalues of M Y , if V is a nonsigular p × p matrix.
The definition of similar matrix and theorem imply that similar matrices M Y and M N have identical eigenvalues.
Graph similarity measure based on Laplacian matrices.The eigenanalysis has been used to describe network structures, and many methods have been developed to measure the networks' similarity/dissimilarity based on the eigenvalues of adjacency, Laplacian-, and normalized Laplacian matrices [1,3,7].The weighted graphs G Y and G N for phenotypes Y and N are represented by p × p matrices M Y and M N (e.g., precision matrix, Laplacian matrix, adjacency matrix, etc.), respectively.Suppose M Y and M N are similar matrices.It follows from Theorem 1 that the matrices M Y and M N have identical eigenvalues We suppose rank(A(G Y )) = rank(A(G Y )) = q � p. Wills and Meyer [3] introduced the following spectral distance to measure the dissimilarity of two graphs dðG Y ; G N Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where q is the number of eigenvalues for best rank approximation of A(G Y ) and A(G N ), and λ (A(G Y )) r and λ(A(G N )) r are r th eigenvalues of adjacency matrices of A(G Y ) and A(G N ), respectively.
Although various graph spectra-based strategies have been proposed to measure network similarity, the existing methods are based on fully dense loadings.However, the fully dense loadings lead to the following drawbacks • The eigenvalues are estimated as linear combinations of all variables, leading to difficulty interpreting the eigenanalysis results.
• The eigenanalysis can be disturbed by noisy features included in datasets.
To settle on the issues, we developed a novel strategy for sparse eigenanalysis to capture gene network structure.
Various methodologies have been developed to estimate sparse loadings.Zou et al. [8] proposed the following sparse PCA ð Â; BÞ ¼ arg min ð8Þ where γ 1 , γ 2 > 0 are regularization parameters for β r , r = 1, . .., q, A = [α 1 , α 2 , . .., α q ], B = [β 1 , β 2 , . .., β q ].Guo et al. [9] developed the novel sparse PCA based on fused loadings, ð Â; BÞ ¼ arg min jr s;t jkb s;r À signðr s;t Þb t;r k 1 g; subject to A T A ¼ I q ; ð9Þ where ρ s,t is the sample correlation between s th and t th features in L and sign(�) is the sign function.Croux et al. [10] proposed a robust version of the sparse PCA based on the strong measure of variance.A methodology for sparse loading estimation was developed by incorporating sparsity into singular value decomposition [11].However, the existing methods were developed purely from mathematical and algorithmic points without considering biological knowledge.Thus, biological interpretation of the results cannot be effectively performed.
To effectively and interpretably perform differentially regulated gene network identification, we proposed a novel strategy for sparse eigenanalysis of the normalized Laplacian matrix L, called Net-sEVD, based on network-constrained regularization [5,[12][13][14], where s * t indicates the set of gene pairs which are connected to s in the gene network.In our strategy, the computed Laplacian matrix L in (3) is used to input of the optimization problem in (10) and the eigenloadings B = [β 1 , β 2 , . .., β q ] are regularized by the normalized Laplacian matrix L. The second term of the objective function of our strategy is a quadratic Laplacian penalty, called a Dirichlet energy, that is used for quantifying the smoothness of the graph, and leads to identify important features by measuring their smoothness on the graph structure [15,16] In our strategy, the quadratic Laplacian penalty used to induce a smooth solution of the parameters β r between neighboring vertices over the gene network.Furthermore, the scaling of the coefficients β r,s respect to the degree (i.e., ffi ffi ffi ffi d s p ) allows the hub gene with more connections to have larger coefficients β r,s .The use of the quadratic Laplacian penalty enables us to incorporate the following knowledge of network biology into the eigen loading estimation [5].

• The genes linked in the networks may have similar biological functions
Using the network-constrained penalty, we encouraged the linked genes' similarity of coefficients in the eigenloadings estimation.The penalty term enabled us to locally smooth the network and encourage the simultaneous selection of related features, the large value of w st (i.e., strong edge between s th gene and t th gene) lead to a large amount of penalty to difference of β r,s and β r,t .It implies that the last term of our method leads to connected genes with strong edges have similar β r,s and β r,t (r = 1, . .., q) in loading matrix estimation.
• The hub genes linked with several genes play critical roles in gene regulation and biological processes [17].
The hub genes are crucial markers for understanding biological mechanisms.Our strategy imposed a relatively small penalty on the hub genes by re-scaling the coefficients with the square root of the degrees d s .Thus, the hub genes and their neighboring genes have relatively large coefficients and are easily selected.
In short, our strategy's eigenvalue estimation is based on crucial sub-networks consisting of the hub genes and their neighboring genes, which leads to a biologically reliable interpretation of the spectral graph analysis results and crucial markers identification.
Algorithm of Net-sEVD.To implement Net-sEVD, we adapted the iterative algorithm of the sparse PCA [8] Step 1. Initialize Â at V by setting it to the eigenvalue decomposition of L.
Step 3 For fixed B, compute SVD of Step 4 Repeat Steps 2-3 until convergence.

Differential gene regulatory networks identification based on Net-sEVD: ssDGN
Suppose the normalized Laplacian matrices L Y and L N that describe the estimated gene networks for p genes, where the networks are estimated by expression levels of phenotypes Y and N. By using the Net-sEVD, we first estimate eigenvalues λ s (L Y ) and λ s (L N ) of p × p matrices L Y and L N , respectively.We then measured the dissimilarity of gene regulatory networks based on the following distance d s (G Y , G N ) of eigenvalues in line with [3] where L Y and L N are the normalized Laplacian matrices of the graphs G Y and G N , and λ s (�) r is the r th eigenvalue of the normalized Laplacian matrix.
To identify differentially regulated gene networks, we considered the permutation p-value of d s (G Y , G N ).We draw permutation cell lines of phenotypes Y and N, respectively, and then estimate the permutation gene networks of p genes for the phenotypes Y and N, i.e., G Y pm and G N pm .The normalized Laplacian matrices L Y pm and L N pm are computed.The eigenvalues of the permutation matrices are estimated as l s ðL Y pm Þ and l s ðL N pm Þ.We then computed the permutation p-value of the distance as follows, where T is the number of permutations, is the distance between eigenvalues of the normalized Laplacian matrices for permutation gene networks G Y pm and G N pm , and T is several permutations.The small permutation p-value indicates that networks G Y and G N show significantly different gene regulatory structures between two phenotypes, Y and N. The differentially regulated gene network identification was performed by the permutation p-value with significance level α.

Regularization parameter selection
The results of the proposed Net-sEVD heavily relies on the regularization parameters γ 1 and γ 2 .Let A g 1 ;g 2 ¼ ½α g 1 ;g 2 1 ; . . .; α g 1 ;g 2 q � and B g 1 ;g 2 ¼ ½ β g 1 ;g 2 1 ; . . .; β g 1 ;g 2 q � be the estimates of A and B. We consider the following criterion to select the tuning parameters, Then, we select the regularization parameters that minimize (13).

Monte Carlo simulations
We illustrated the performance of the proposed ssDGN based on Monte Carlo simulations.We supposed two phenotypes, A and B, with sample sizes n A = n B = 100.We considered five common and five A-specific sub-networks, where each sub-network comprises ten genes.
In scenario 1, we generated five precision matrices for each common sub-network (i.e., O nw c , nw = 1,.., 5) and A-specific networks (i.e., O nw A , nw = 1,.., 5) from "random" graph structure using huge.generatoran R package Huge previously described [18].The expression levels of each of the ten genes in every five common networks were generated from Nð0 10 ; where the number of cell lines is n = 200.For the A-specific networks, the expression levels X A of each of the ten genes comprising each sub-network are generated from Nð0 10 ; where the sample size of phenotype A is n A = 100.We then generated the expression levels X B of phenotype B from N(0 50 , S 50 ), where S 50 is the diagonal matrix whose diagonal is σ 2 .Scenarios 2, 3, and 4 are similar to scenario 1 except that the precision matrices O nw c for nw = 1,.., 5 and O nw A for nw = 1, . .., 5 have "cluster," "scale-free," and "hub" graph structures for scenarios 2, 3, and 4, respectively.We use significance level α = 0.05 and extract the subnetwork corresponding p.value <0.05.In the differentially regulated gene network identification, we first estimate gene regulatory networks G A and G B based on the generated expression levels of genes for phenotypes A and B (X A and X B ), respectively.In the simulation study, we compute the correlation matrices of expression levels of genes X A and X B to describe gene network (i.e., correlation network) for phenotypes A and B.
where Cov(x i , x j ) is a covariance between expression levels of the i th and j th genes, and s x i is a standard deviation of the expression levels of the i th genes.The absolute values of correlation coefficients are used as edge weights W A and W B , We then compute the normalized Laplacian matrices L A and L B in (3).By using the proposed Net-sEVD, we compute eigenvalues of the normalized Laplacian matrices L A and L B and then extract phenotype specific sub-network (i.e., the subject of the proposed method Net-sEVD are the normalized Laplacian matrices L A and L B ).We evaluated ssDGN compared with similarity measures based on eigenvalues estimated by fully dense loading (i.e., ordinary EVD: oEVD), sparse loadings (sPCA) [8], sparse loadings (sPCA) without L 2 norm (sPCAwoL2), fused sparse loadings (sfPCA) [9] and sign adjusted quadratic Laplacian penalty β T r L s β r (ssDGNs), where L s = S T LS with S ¼ diagfsgnð br;1 Þ; sgnð br;2 ; . . .; sgnð br;p Þg.The permutation p-values were computed based on 500 permutation samples, and the simulations were repeated 50 times.To evaluate the performance of differential gene network identification, we compare accuracy, F-measure, true negative rate (TNR), recall, and precision.Tables 1-3 show the average of the results in 50 replications for σ 2 = 1, 3 and 5, respectively, where the bold values indicate the best performance among the methods.As shown in Tables 1-3, the proposed ssDGN effectively identified differential regulatory networks for scenarios 1, 3 and 4 in overall.The methods showed similar performance in the viewpoint of Recall (i.e., true positive rate), while our strategy outperformed true negative identification given in the column "TNR".It can be seen through Tables 1-3 that the proposed ssDGN provides effective performance for differentially regulated gene network identification, overall.We expected that our strategy will be a useful tool to identify phenotype-specific molecular inteplays that can decribed crucial characteristics of cancer mechanism.

Gastric cancer drug resistance specific gene networks identification
We applied ssDGN to identify differentially regulated gene networks between gastric cancer drug-sensitive and -resistant cell lines based on the publicly available Sanger Genomics of  Drug Sensitivity in Cancer (GDSC) dataset from the Cancer Genome Project.We considered four anti-cancer drugs, doxorubicin, mitomycin-c, 5-FU, and docetaxel, approved by the Food and Drug Administration (FDA), for gastric cancer.We used the log of IC50 values as the drug sensitivity and the expression levels of 10% of the genes with the highest variance in all cell lines, i.e., 404 genes are considered as candidate regulator and target genes.The cell lines were matched by the expression levels and the drug sensitivity, where 948, 855, 891, and 948 cell lines for doxorubicin, mitomycin-c, 5-FU, and docetaxel were matched, respectively.We then defined drug-sensitive and -resistant cell lines based on 1 st (D 1st ) and 3 rd (D 3rd ) quartiles of drug sensitivity (DS) of each drug, i.e., drug-sensitive cell lines: DS ST < D 1st and drug-resistant cell lines: DS RS > D 3rd .
In GDSC data analysis, the linear regression model to represent gene regulatory system is given as, where x ðÀ jÞ i ¼ ðx i1 ; . . .; x i;jÀ 1 ; x i;jþ1 ; . . .; x ip Þ, δ j = (δ j1 , . .., δ j,j−1 , δ j,j+1 , . .., δ jp ) T is the regression coefficient that represents the effect of p candidate regulator genes x i on j th target gene x ij and � ij is a random error vector for the j th target gene.For each j th target gene, we consider the remaining p = 403 genes as candidate regulator genes.The following lasso is used to select true regulator genes and estimate the gene networks.[19], where and λ > 0 is the regularization parameters of δ j .The weight of edges W = w ij is computed based on the effect of the i th gene to j th gene (i.e., δ ij ) and the j th gene to i th gene (i.e., δ ji ) as follows, Then, the normalized Laplacian matrix L is computed by the weight of edges W.
In order to identify the differentially regulated gene network, We first estimate drug sensitive and resistance -specific gene regulatory networks NW ST  ).For the sub networks consisting of the extracted edges, we identified differentially regulated gene networks between drug-sensitive and -resistant cell lines, where only sub-networks with nodes greater than five were considered.For doxorubicin, mitomycin-c, 5-FU, and docetaxel, 13, 13, 14, and 12 sub-networks were considered for differentially regulated gene network identification.We extracted sub-networks with a p-value smaller than 0.01, and 1, 1, and 2 networks were identified for doxorubicin, 5-FU, and docetaxel, respectively.There was no differentially regulated gene network for mitomycin-c.We then matched the identified 1, 1, and 3 sub networks with NW ST 1% and NW RS 1% , where the matched networks can be considered as drug sensitive and resistance -specific gene regulatory structures.
Fig 2 shows the identified drug sensitive and resistance -specific gene networks, where color of edges indicates drugs, i.e., red: 5-FU, blue: doxorubicin, green: docetaxel.As shown in Fig 2, drug sensitive and resistance cell lines show considerably different molecular interplays, where drug resistance cell lines have relatively active and dense gene regulatory system.It implies that our strategy specify drug response -specific gene regulatory structure.We also perform drug sensitive and resistance networks estimation based on the ordinary sparse PCA (sPCA), because sPCA also show effective performance in simulation studies.The sPCA provide relatively dense networks compared with our method, i.e., 26 (13) and 100 (78) edges are identified by ssDGN and sPCA, respectively, in drug resistance (sensitive) cell lines.It can be seen that all edges identified by our method in drug resistance and sensitive cell lines are also identified by sPCA (see S1 File, tab: sPCA).Although some of edges identified by sPCA are not extracted by our method, the gene networks constructed by our approaches can be considered as crucial molecular interplays for uncovering gastric cancer drug resistance mechanism.
In order to uncover drug resistance mechanisms of gastric cancer, we focus on the identified drug resistance-specific gene networks.Table 4 shows the identified gastric cancer drug resistance markers ASCL1, NOL4, BTK, ELAVL4, LAPTM5, SCG2 that have more than three edges in the drug resistance-specific networks and their evidences, where the columns "Gastric cancer" and "Gastric cancer drugs" indicate literature related to the mechanism of gastric cancer and anti-cancer drugs uncovered by previous studies, respectively.We also perform gastric cancer resistance marker identification by using sPCA.The sPCA identify the following genes, ASCL1 (13), EPS8L3 (11), LGALS4(9), HSPA1A(8), NOL4(8), ELAVL4(6), IFIT3(4), SCG2(4), , where the numbers in parentheses indicate the number of edges.All gastric resistance markers identified our method in Table 4 are also identified by the sPCA.It implies that the genes can be considered as crucial markers that are strongly supported by data-driven approaches.

Gastric cancer
ASCL1 expression levels have been considered characteristic of gastric cancer, and ASCL1 overexpression was identified as a signature of gastric tumor [21].ASCL1 was identified as a differentially expressed gene in most gastric cancer subtypes [20].High expression levels of SCG2 were also identified in subtype gastric cancer [34,35].BTK mutation was amplified in gastric cancer, which plays a critical role in anti-cancer drug resistance [28,29].BTK can be a therapeutic gastric cancer target since BTK expression knockdown inhibits the gastric cancer cell growth [24].The ELAVL4 and LAPTM5 genes were also identified as differentially  expressed genes in gastric cancer [31,32].Furthermore, SCG2 is reportedly crucial to study therapeutic strategies to treat gastric tumors [34].

Chemotherapy of gastric cancer
Targeted ASLC1 delivery of siRNA and doxorubicin could be a potential strategy for effective neuroendocrine tumor treatment [23].Ovarian cancer stem-like cells are highly resistant to cisplatin which is attributable to the BTK signaling overexpression [24].CTN06, a novel BTK and ETK dual inhibitor reportedly re-sensitized cell lines to docetaxel [25].Docetaxel substantially increased IRF3, BTK, and DDX41 expression levels [26].BTK inhibitors reportedly synergize with 5-FU to treat drug-resistant TP53-null colon cancers, and using BTK inhibitors in combination with 5-FU is a novel therapeutic approach for colorectal cancer [27].Additionally, doxorubicin treatment for EBNA2-positive DLBCL cells is effectively complemented with a NF-κB or BTK inhibitor [30].LAPTM5 was identified as a novel marker to enhance the resistance to docetaxel in breast cancer cells [33].SCG2 was identified as a prognostic marker for chemotherapy in colorectal cancer by 5-FU [36].
As shown in Table 4, the drug resistance markers identified by our strategy have strong evidence as markers for gastric cancer and chemotherapy.The drug resistance-specific molecular interplays in the right side of Fig 2 may provide crucial clues to uncover the acquired drug resistance mechanisms of gastric cancer.Therefore, suppressing marker activities and their interplays can provide crucial clues to enhance the sensitivity to gastric cancer drugs.
To identify biological processes involved in the identified drug resistance (sensitive) markers, we performed gene enrichment analysis using the bioinformatics tool Database for Annotation, Visualization and Integrated Discovery (DAVID) [37].DAVID is a web server for functional enrichment analysis and functional annotation of gene lists.GO term annotations are performed on these lists of genes with category "Molecular Function (MF)", "Cellular Component (CC)" and "Biological Processes (BP)" based on species and databases [38].We use the 26 (14) identified differentially regulated network in drug resistance (sensitive) cell lines as input for GO term pathway analysis.
Fig 3 shows the significant pathways selected with p.value < 0.05, -log(p.value)and genes comprising each pathway.As shown in Fig 3, the linked genes are enriched in a pathway, e.g., gene COL4A1 and PXDN are involved in "Extracellular matrix structural constituent", "Extracellular matrix organization" and "Extracellular matrix" pathways.The S100A16 and S100A16 are involved in "Calcium-dependent protein binding" and "Plasma membrane" pathways.Furthermore, the genes consisting the subnetwork in bottom of drug sensitive specific cell lines, i.e., TNFRSF12A, S100A16, AHNAK2, MYOF, S100A13 and TM4SF1 are enriched in 'Plasma membrane" pathway.The result implies that the genes linked in the networks have similar biological functions, and incorporation of the knowledge may provide biologically reliable results.GO term pathway analysis revealed that the gastric drug resistance markers are significantly enriched in the "Extracellular" associated pathway, especially the markers in "Extracellular matrix" (ECM) related pathways.The definition of the enriched "Extracellular" associated pathways is given in "THE GENE ONTOLOGY RESOURCE" (http:// geneontology.org/)as follows

Extracellular matrix:
A structure lying external to one or more cells providing structural support and biochemical or biomechanical cues for cells or tissues.

Extracellular region:
The space external to the outer structure of a cell.This region refers to space outside the plasma membrane for cells without external protective or encapsulating structures.This term covers the host cell environment outside an intracellular parasite.

Extracellular matrix organization:
A process carried out at the cellular level resulting in the assembly, arrangement of constituent parts, or disassembly of an extracellular matrix.Extracellular matrix structural constituent: The action of a molecule that contributes to the structural integrity of the extracellular matrix.

Extracellular space:
The part of a multicellular organism outside the cell, usually outside the plasma membranes and occupied by fluid.
Increased ECM stiffness has been considered a key hallmark of cancer because the ECM plays a crucial role in accelerating tumor progression, reducing the cure ratio, and promoting chemotherapy resistance of cancer cell lines [39].The ECM can hinder chemotherapy effects by acting as a physical barrier, impeding drug access to the tumor [40].In breast cancer cell lines, increasing matrix stiffness leads to drug resistance of cancer cell lines by impeding drug delivery ability and reducing drug sensitivity [39].Depleted ECM components were demonstrated as essential features relating to 5-FU beneficial responses in patients with gastric cancer [41].Enriched ECM components have been recognized as one of the primary reasons causing anti-cancer drug resistance as stiffened ECM impeded the drugs and immune cell infiltration into the tumor [41].ECM-mediated chemoresistance is caused by a decrease in cell proliferation Keeratichamroen et al. [42], and proteins confer cell adhesion-mediated drug resistance [43].Gene expression analysis revealed that ECM protein genes COL3A1, COL5A2, and COL15A1 are significantly upregulated in drug-resistant ovarian cancer cells [44].
We consider COL4A1, PXDN, and TGFBI key markers of the "Extracellular" associated pathways.

• COL4A1
COL4A1 has been considered a crucial marker to drive trastuzumab resistance in gastric cancer by performing protein/gene interactions and biological process annotation analyses [45].COL4A1 shows overexpression in gastric cancer tissues and is upregulated in trastuzumab-resistant gastric cancer cells.

• PXDN
The high PXDN expression level has been considered a signature of drug resistance in cancer cells, and it was suggested that patients with a high PXDN expression are more likely to develop drug resistance [46].
• TGFBI TGFBI and its pathway are well-known drug resistance markers.The loss of TGFBI is reportedly sufficient to induce chemotherapy resistance by paclitaxel [47].Furthermore, TGFBI knockdown leads to SKOV-3 cells being resistant to paclitaxel, and TGFBI was considered a marker for paclitaxel-resistant ovarian cancer cell line [48].GFBI hypermethylation was reportedly considerably associated with trastuzumab resistance in HER2+ breast cancer patients [49].
These findings imply that uncovering the COL4A1, PXDN, and TGFBI mechanisms consisting of the ECM-related pathways are crucial to understanding mechanisms underlying acquired gastric cancer drug resistance.We also perform GOterm analysis for genes out of the identified networks given in Although we did not incorporate any biomedical knowledge into the ssDGN identification, our data-driven analysis provided biologically reliable and valuable results to uncover mechanisms involved in gastric cancer drug resistance.Our GO term pathway analysis results strongly suggest that suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays involved in the extracellular-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer.

Discussion
In order to uncover crucial molecular interplay that plays key roles in the mechanism of gastic cancer drug resistance, we developed a novel strategy for differentially regulated gene network identification based on sparse spectral graph analysis.The proposed ssDGN estimates the eigenvalue of graph Laplacian matrices based on sparse eigen loading.Thus, we can effectively perform differential gene network identification without disturbance of noise features.The sparse eigenanalysis also leads to interpretable results; for example, we can identify crucial features to describe graph structure.In this study, we considered computational efficiency and biological interpretability and incorporated network biology knowledge into the eigenanalysis of the graph structure.Therefore, our strategy can provide biologically reliable and interpretable results.
We performed Monte Carlo simulations to illustrate the proposed strategy.The simulation results showed that the proposed ssDGN provides outstanding differential gene network identification performance.We applied our strategy to identify differentially regulated gene networks between gastric cancer drug-sensitive and -resistant cell lines.For the gastric cancer drugs, doxorubicin, Mitomycin-C, 5-FU, and Docetaxel, we estimated each drug's sensitivityand resistance-specific gene network and then identified the differentially regulated sub-networks between two phenotypes.The identified gastric drug resistance-specific molecular interplays and markers have strong evidence for mechanisms related to gastric cancer and its chemotherapy.The GO term pathway analysis results show that ECM-associated pathways and their members, COL4A1, PXDN, and TGFBI, are involved in gastric cancer's acquired drug resistance mechanism.We suggest that suppression and/or induction of COL4A1, PXDN, and TGFBI, and their molecular interplays involved in the ECM-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer.
Although the ssDGN shows effective results for differentially regulated gene network identification, our method cannot distinguish the difference between the following two networks G1 = {g1, g2 − g3} and G2 = {g1 − g2, g3} where the edge strength of g1-g2 and g2-g3 are same, because the eigenvalue of two normalized Laplacian matrices are same.This point is obvious limitation of our method.Further work remains to be done towards developing a method that can incorporate not only structure but also coordinates of network.
Fig 1 shows the graph structures of the common sub-networks and A-specific sub-networks.

Fig 3 .
Fig 3. Gene Ontology (GO) term pathway analysis of gastric cancer drug resistance (sensitive) markers and genes out of network, where red and blue indicate drug resistance and sensitive -specific pathways, respectively.https://doi.org/10.1371/journal.pone.0305386.g003

Fig 2 .
The bottom (gray) of Fig 3 shows the most enriched 10 pathways of the genes.As shown in Fig 3, "Nucleoplasm", "Cytosol", and "Protein binding" are the most enriched pathways for the genes not in the drug sensitive and resistance networks.All pathways for the genes are given in S1 File.It can be seen that the genes in the network of drug sensitive and resistance cell lines are involved in the specific pathways, not in general pathway (bottom of Fig 3).

Table 1 .
and NW RS based on DSST and   (Continued)

Table 2 . Simulation results: Differential gene network identification σ 2 = 3.
DSRS.From the two gene networks, we extracted each 1% edges with the largest absolute value of edge size (i.e., NW ST 1% and NW RS 1%